Concerning the current lack of effective development and deployment tools for deep learning applications, a component-based development framework for deep learning applications was proposed. The framework splits functions according to the type of resource consumption, uses a review-guided resource allocation scheme for bottleneck elimination, and uses a step-by-step boxing scheme for function placement that takes into account high CPU utilization and low memory overhead. The real-time license plate number detection application developed based on this framework achieved 82% GPU utilization in throughput-first mode, 0.73 s average application latency in latency-first mode, and 68.8% average CPU utilization in three modes (throughput-first mode, latency-first mode, and balanced throughput/latency mode). The experimental results show that based on this framework, a balanced configuration of hardware throughput and application latency can be performed to efficiently utilize the computing resources of the platform in the throughput-first mode and meet the low latency requirements of the applications in the latency-first mode. Compared with MediaPipe, the use of this framework enabled ultra-real-time multi-person pose estimation application development, and the detection frame rate of the application was improved by up to 1 077%. The experimental results show that the framework is an effective solution for deep learning application development and deployment on CPU-GPU heterogeneous servers.